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Abstract —Tamper-resistance is a fundamental software se¬ 
curity research area. Many approaches have been proposed to 
thwart specific procedures of tampering, e.g., obfuscation and 
self-checksumming. However, to our best knowledge, none of them 
can achieve theoretically tamper-resistance. Our idea is to impede 
the replication of tampering via program diversification, and thus 
increasing the complexity to break the whole software system. To 
this end, we propose to deliver same featured, but functionally 
nonequivalent software copies to different machines. We formally 
define the problem as N-version obfuscation, and provide a viable 
means to solve the problem. Our evaluation result shows that the 
time required for breaking a software system is linearly increased 
with the number of software versions, which is 0(n) complexity. 


I. Introduction 

Once software has been released, it faces many security 
threats. Lor example, attackers may crack its license protection 
mechanism to sell pirated software, or they may pack some 
payload (e.g., advertisement) into the original software for 
certain purposes. Nowadays, the repacked app with malicious 
payload becomes even a major threat to mobile security 
f32l . Such attacks generally involve reverse engineering 
techniques to tamper the software. To protect software from 
tampering, two major approaches have been proposed, i.e., 
obfuscation and self-checksumming. On one hand, software 
can be protected with obfuscation approaches to deter attackers 
from locating the target code spot. On the other hand, 
software can embed self-checksumming code to detect whether 
it has been tampered during execution. Current tamper- 
resistant work mainly proposes such kinds of tricks to thwart 
specific tampering tools or approaches, such as using a 
loop with unsolved conjectures to thwart symbolic execution. 
Nonetheless, once the trick is recognized, skillful attackers can 
design hand-crafted tools to launch an attack. It seems safe to 
conjecture that software cannot achieve theoretically tamper- 
resistance without trusted hardware circuits (6). 

A general software tampering objective is to enable 
replicating the tampering on other machines. Intuitively, we 
cannot guarantee a piece of software to be fully tamper- 
resistant, but we can fail the execution of tampered software 
on general machines, other than the attacker’s. Such an 
idea, known as program diversification tm, is to prevent 
widespread attacks by making intrusions much harder to 
replicate. If the attacker wish to run the tampered software 
on another machine, she has to work on it specifically. In this 
way, we can cut off the contagion of tampering so as to control 
the scope of potential damage. 


According to a recent survey ED, existing software diver¬ 
sification approaches generally consider functionally equiva¬ 
lent programs, which can be effective against several kinds of 
attacks such as return oriented programming. However, such 
approaches with the functionally equivalence constraint cannot 
meet our need to disable the replication of tampered software. 
As a first attempt, we propose to deliver same featured, 
but functionally nonequivalent software copies to different 
machines. A major challenge towards this goal is which 
part of a program can have such functionally nonequivalent 
diversities. We formally define the problem as N-version 
obfuscation, and provide a viable solution for the software 
of client-server architecture, i.e., by integrating a message 
authentication code (MAC) mechanism with functionally 
nonequivalent SHA1 algorithms ni to the original software, 
it can be resistant to tampering replication. We further show 
that many software integrity protection problems can be 
reduced to our solution model. It is worth noting that N- 
version obfuscation can be applied seamlessly to other existing 
tampering-resistant approaches, and hence equipping them 
with the replication-resistant property. Our analysis result 
shows that the tampering complexity incurred by N-version 
obfuscation increases almost linearly with the number of 
functionally nonequivalent software versions. 

The rest of this paper is organized as follows. We first 
discuss the background in Section QI] We then demonstrate 
our approach in Section [Till and evaluate its effectiveness in 
Section llVl The related work is discussed in Section[V] Linally, 
Section [VT] concludes this paper. 

II. Background 
A. Threat Model and Assumption 

Software delivered to end users is vulnerable to tampering. 
Attackers may modify the original program execution logic for 
a specific purpose, and then replicate the tampering on other 
machines. The modification can be achieved in two ways: 

Software Repack : Attackers can manipulate the software 
executables directly, for example, they may remove the original 
advertisement module embedded in a mobile app installation 
file, and replace it with another one beneficial to themselves. 

Dynamic Injection : Attackers may also dynamically inject a 
piece of code into the program process, so as to manipulate the 
loaded program during execution, e.g., using Linux ptrace 
tool. Such an approach is widely adopted by virus and anti¬ 
virus software, which injects either inspection code to monitor 
the program execution or places a back door to control it. 
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In this work, we consider the hostile host model l25l . which 
is widely adopted by software tamper-resistance work, such 
as mol. We assume that to launch such tampering attacks, 
attackers can use malicious host to analyze the software, and 
they can fully inspect the software execution step by step. 


/*Original code:*/ /*After obfuscation:*/ 

if( b == 1 ) if ( hash(b) == hashvalue ) 

fo °0; foo(); 

(a) One way function 


B. Limitation of Anti-reverse Engineering 

Reverse engineering is a crucial technology for software 
tampering. It involves a process that analyzes and manipulates 
a software based on its executables, e.g., in an executable and 
linkable format (ELF). Anti-reversing techniques impedes such 
a process by adding tricks into the executables to fool the 
analyzer. General reverse engineering on ELF files involves 
two phases: a disassembly phase, and an analyzing phase. 
The disassembly phase decodes the ELF binaries to assembly 
code, which can be performed by some tools (e.g., IDA) 
automatically. We can hardly impede the decoding because 
eventually the processor has to be able to decode and execute 
the file. But on the other hand, many tricks have been 
proposed to obstruct the analyzing phase. We discuss the major 
approaches and their limitations in what follows. 

There are two general ways to do reverse analysis, i.e., 
the offline approach and the online approach. The offline 
approach does not execute the assembly code, but directly 
analyzes it using reverse engineering tools such as IDA j2j. 
If a program has not been properly obfuscated, its control 
flow graph (CFG) can be easily derived, which shows the 
assembly code in blocks, and indicates their call relationships. 
In this way, the complexity of analyzing the assembly code can 
be simplified. CFG can provide great assistance for reverse 
engineers to grasp the meaning of the low-level assembly 
code which has little semantics. To protect programs from 
being analyzed offline, a few obfuscation approaches have 
been proposed ED. The main idea of obfuscation is to add 
some junk code into the original program, while the original 
functionality of the program is still preserved. Several methods 
have been proposed to achieve this. For example, one can use 
opaque constant to add blocks of junk code that would never be 
executed. Fig. |l(b)| shows such an example. He may further 
create an NP-hard problem using a bunch of such code as 
discussed in li2Qj . Another method is to confuse the trigger 
condition of one code block with one-way function, so that 
the static analyzer cannot infer whether the code would be 
executed or is junk code. Fig. |l(a)| shows an example of 
transferring an obvious condition into an opaque condition 
with a hash function. Besides, unsolved conjectures have 
been proposed to confuse the exit criteria of loops, e.g., 
Fig. |l(c)| is an example of using Collatz Conjecture. Such 
obfuscation techniques can increase great difficulties to general 
static analysis tools for analyzing the CFG and grasping the 
meaning of assembly codes. However, all the obfuscation 
approaches have vulnerabilities. For example, opaque constant 
is vulnerable to symbolic execution which implements a smart 
constraint solver, and unsolved conjectures are vulnerable 
to homemade tools which can recognize the patterns of 
conjectures. 

Even though many powerful offline analysis tools are 
available off-the-shelf, pure offline analysis still suffers hard 
limitations in detecting some anti-reversing protections, e.g., 
runtime code unpacking is widely used by malware to escape 


int b = getcharQ; 

//always true 
if (7 a 2 - 1 fb 2 ){ 
foo(); 

} 

else { 

junkO; 

} 

(b) Opaque constant 


int x = 2000; 
while ( x > 1) { 
if ( x % 2 == 1) { 
x = 3 * x + 1; 

} 

else x = x / 2; 
if ( x == 1) 
foo(); 

} 

(c) Collatz Conjecture 


Fig. 1. Demonstration of obfuscation approaches with different tricks. The 
original code for Fig. 11(b)] and Fig. 11 (c)| is foo(); 


static analysis. Therefore, adversaries may also execute the 
code to obtain execution instruction traces lf24l or debug the 
code step by step to perform the analysis, which is known 
as online reverse-engineering ED- Such an analysis process 
generally would not be affected much by obfuscation fill! . 
and adversaries can leverage a set of system monitoring tools 
to monitor the execution outcome of a code block, which 
facilitates the reverse-engineering process. Researchers have 
suggested to set traps with anti-debugger code to hinder 
debugging. For example, one may simply check the debug 
register to detect if a debugger is present, or count the 
execution time of a code block to detect if it has been paused, 
and then penalize the debugger IT3l |27| . Again, if the trick 
of anti-debugger code is recognized, adversaries can suppress 
the checking by patching the binaries, or simply switching to 
another debugger. 

When deriving enough understanding about the code, 
adversaries can manipulate the binaries by adding or deleting 
some code according to a specific purpose while preserving 
its ability of execution. A possible way to detect such code 
patching is to use self-checksumming code. The basic idea is 
to pre-calculate a value of relative address (i.e.,, the checksum), 
and let the program fetch instructions during execution 
according to such a value. If the checksum governed regions 
has been tampered, the instruction would not be correct and 
the program would likely to suspend |29l . Using overlapped 
self-checksumming code can further increase the strength of 
protection. However, it can be defeated by carefully detecting 
and removing them l23l or exploring the vulnerabilities j29l 
of execution environment. 

In other words, there is still no overwhelming anti-reverse 
engineering method, i.e., software can never be made fully 
resistant to tampering without hardware protection. 


III. Our Proposed Approach 

In this section, we introduce the N-version obfuscation 
problem first, and then discuss a possible solution with its 
application scenario in achieving tamper-resistance. 
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A. Problem Definition 

We formally define the N-version obfuscation problem 
as following: Given an algorithm A, how to automatically 
generate a large set of functionally nonequivalent algorithms 
{Ci,... C „}, which are similar to A, and their parent algorithm 
P, so that they meet the following two properties: 

Homo Property, when performing on the same task, P can 
output the same result as Ci, if the gene vector {g i, ...g n } of 
Ci is known to P. 

Divergence Property: when performing on the same task, 
Ci and Cj generally output different results. 

Suppose the software architecture is client-server mode, 
we can deploy the parent algorithm on the server side, and 
deliver a unique children algorithm to each client. In this way, 
the software distributed to client can have such functionally 
nonequivalent diversities according to the divergence property, 
and the homo property enables the server to handle such 
diversities. 

B. N-Version Obfuscated SHA1 

In this section, we show a viable means to solve the 
N-version obfuscation problem with SHA1 algorithm. Our 
approach leverages the iterations of calculations needed by 
SHA1 to generate functionally nonequivalent diversities. 

The main loop of original SHA1 (Algorithm 0} includes 
80 rounds of iterations. Each iteration takes one plaintext 
block (ui[i]) into calculation. For every twenty rounds, the 
calculation (the equation for generating / and the value of 
k) switches to another one. Even though there are some 
security considerations of choosing a specific calculation for 
each round, no evidence shows the program would suffer 
great security degradation if we switch them with each other. 
Therefore, we can diversify the original SHA1 algorithm by 
choosing different sequences of equations for generating / and 
values of fc, which are the genes of individuals. We can also 
design a parent algorithm which can receive the genes of a 
child, and process data input according to the setting of genes. 
Algorithm U shows the such a parent algorithm we designed. 
In Algorithm [2] the pointer array of equations (/p[80]) for 
generating /, and the value array of fc for the 80 rounds of 
iterations are passed to the algorithm as the genes of a child. 
It is clearly seen that, given the same input w [SO], the parent 
algorithm can compute the same result as a child when fp[ 80] 
and fc[80] are properly set. 

C. Implementation 

We automate the process of generating N-version SHA1 
algorithms based on Low Level Virtual Machine (LLVM) |4], 
which is a widely used opensource compiler that supports 
extensions. LLVM first represents the source code with 
Abstract Syntax Tree (AST), and then transfers it into 
intermediate code (IR), which would finally be compiled 
into executables according to a specific platform. LLVM 
supports plugin and libtooling, which can manipulate 
the source code of a target AST branch during the compilation 
process. We hence customize a libtooling tool that can 
automatically generate the N-version obfuscation algorithms. 


for i = 0; i < 80; i + + do 

if 0 < i < 19 then 

/ 4- (b AND c) OR ((NOT b) AND d); 
k <- 0X5A827999; 

end 

if 20 < i < 39 then 

/ «- b XOR c XOR d; 
k «- 0X6ED9EBA1; 

end 

if 40 < i < 59 then 

/ «- (b AND c) OR (b AND d) OR (c AND d); 
k <- 0X8F1BBCDC; 

end 

if 60 < i < 79 then 

f-t-b XOR c XOR d; 
k <- 0XCA62C1D6; 

end 

temp «— (a LEFTROTATE 5 ) + f + e + k + ui[i\; 
e t— d; 
d t— c; 

c<-b LEFTROTATE 30 ; 
b <— a; 
a <— temp; 

end 

Algorithm 1: The main loop of SHA1 


Data: /p[80l, k[80], w[80l 
for i = 0; i < 80; i + + do 
Call fp[i\; 

// Pointer to F0, FI, F2 or F3 
F_TAIL(fc[i],w[j]); 

end 

Function F0() 

j / «- (6 AND c) OR ((NOT b) AND d); 
Function Fl() 

| / «- b XOR c XOR d; 

Function F2[) 

I / «- (6 AND c) OR (b AND d) OR (c AND d); 
Function F3() 

| / «- b XOR c XOR d; 

Function F_TAIL(k,w) 

temp <— ( a LEFTROTATE 5 ) + f + e + k + w; 
e <— d; 
d t— c; 

c <— b LEFTROTATE 30; 
b <— a; 
a <— temp; 

Algorithm 2: A parent algorithm for SHA1 


According to Algorithm [3 each gene (either fp[i\ or k[i]) 
has four possibilities, so we use two bits to represent a gene. 
In each compilation, we first randomly generate two 160 
length bit sequences: one as the chromosome for the equation 
function pointer (i.e., fp [80] ) and the other as the chromosome 
for the value option of k (i.e., A;[80]). We then replace the 
corresponding AST branches with hardcoded equation function 
pointers and settings of k. 

It is worth noting that the N-version obfuscation approach 
itself does not provide any resistance to reverse engineering. 
However, our approach can be seamlessly integrated with 
other aiti-reverse engineering protections, for example, the 
obfuscation approaches proposed in f20j . which composes NP- 
hard problems with function pointers and opaque constants. 
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root 3757 200 c00bdb08 00000000 s kworker/u:2 

U0_a61 3789 134 331264 57332 ffffffff 4O103be4 R con.sankuai.meltuan 

root 3839 2950 1088 228 00000000 4O10b9ac R ps 

root@androtd:/ # cat /proc/3789/naps 


2a8cd000-2a8eOO0O ---p 000000O0 0O:O0 O 
2a8e0000-2a8f0O00 rw-p 00000000 00:00 0 
2a8f00OO-2a8fdO00 ---p 00000000 00:00 0 

2b0O0000-2b300000 ---p 00000000 00:00 0 
2b300000-2b400000 rw-p 00000000 00:00 0 
2b400000-2b800000 ---p 00000000 00:00 0 

2e800000-2e821000 rw-p 00000000 00:00 0 

38f00000-38f49000 rw-p 00000000 00:00 0 
3C900000-3C929000 rw-p 00000000 00:00 0 
40000000-4000C000 rw-p 00000000 00:00 0 
4000c000-4000d000 r--s 00019000 103:0d 32635 
4000d000-4000e000 r--s 00000000 00:04 5228 
4000e000-4000f000 r--s 0029f000 103:0d 99723 
4O00f0O0-40011000 rw-p 00000000 00:00 0 
40011000-40015000 rw-p 00000000 00:00 0 
40015000-40016000 r-xp 00000000 103:0b 1145 
40016000-40017000 rwxp 00000000 103:0b 1145 
40017000-40019000 rw-p 00000000 00:00 0 


/data/data/com.Ibe.securlty/app_htps/cltent.jar 
/dev/ashmen/SurfaceFlinger read-only heap (deleted) 
/data/data/con.sankuai.P)eituan/code_cache/secondary- 


/systen/llb/llbhardware.so 

/systen/llb/ltbhardware.so 


Fig. 3. An exemplary Android app (com. sankuai .meituan), which 
has been injected with an LBE library (com. Ibe . . . /client. jar). By 
checking the maps file of the app process (pid:3789), we can detect the 
tampering. 


Fig. 2. Application of N-version obfuscated program in tamper resistance 

D. Application Discussion 

Many client-server software can adopt the N-version 
obfuscation idea by implementing a MAC mechanism with 
the obfuscated algorithm ( e.g., SHA1). MAC is a popular 
mechanism adopted by client-server computing architecture 
to check the integrity and authenticity of messages. When a 
client sends a request to the server, it calculates the MAC of 
the request and appends it to the original request. The server 
validates the MAC first and then processes the request. Since 
a hash function is one major component of a MAC algorithm, 
the N-version obfuscated SHA1 algorithms can be adopted. 
Fig. [2] illustrates such a mechanism. 

In Fig. 0 each client is embedded with a unique SHA1- 
based MAC calculation algorithm. To successfully perform a 
request to the server, it has to send the identification (such 
as machine serial number or user id), the request, and the 
MAC together to the server. The server queries the genes 
of a client from its local N-version database according to 
the identification of the client, and then verifies the MAC. 
The distribution of such diverse programs can be achieved by 
implementing the MAC in mobile code ( i.e ., dynamic library), 
and delivering it by the server upon request. In other words, 
the client software can be launched without the library at the 
first time and then requests the server for the library. The 
server randomly chooses a library from a pool of pre-compiled 
libraries and delivers it to the client; in the meanwhile, it 
records the mapping between the genes of the client and its 
unique identification in the N-version obfuscation database. In 
this way, the server can verify the MAC generated by each 
client according to the homo property. 

We hence provide tamper-resistant capability for the client 
software based on the N-version obfuscation library, which is 
resistant to replication according to the divergence property. 
To this end, a viable means is to implement an integrity 
checking function align with the MAC in the library, so that 
the library can serve as a security guard for the software. 
By interleaving the code of the integrity checking function 
with the MAC algorithm, the integrity checking can be 
triggered when calculating a MAC. Algorithm [3] shows such an 
exemplary integrity checking function for the apps of Android 
operating system. The function navigates the maps file of 
the app process itself, which records the program segments 
and their address in the memory. It then compares the record 


Data: diet < segment > 

// A list of predefined segment with name 
and size 

Function IntegrityChk() 

pid <— getpid)); 
file <— open (/proc/pid/maps); 
while line «— readline(file) != EOF do 
segName <— GetSegName(/me); 
segSize t— GetSegSize(Ime); 
if \dict.contciins(segNmae) then 
| Reaction)); 
else 

if dict.getsize(segNmae)!=segSize then 

| Reaction)); 

end 

end 

end 

Algorithm 3: An exemplary integrity checking function 


with a previously defined standard dictionary by developers. 
If there are abnormal segments in the maps, i.e., the integrity 
has been violated, a responsive mechanism can be triggered. 
Such an approach is effective in detecting either software 
repacking or dynamic injection attack as we have discussed 
in the adversary model. We show an exemplary App (Fig. 0, 
which is tampered by LBE (a commercial security software 
for Android ED- Algorithm 0 can detect such a tampering 
by finding that com. Ibe . . . /client. jar is an abnormal 
segment. 

If an attacker has successfully tampered one copy of the 
guard (e.g., removing the integrity checking function) and 
replicated it on other machines, the server can detect the 
replication because of an incorrect MAC, i.e., inconsistent 
mapping between the identification and genes. We may further 
implement a reaction mechanism to renew the guard or crash 
the client software directly. 

A question to ask is why we do not simply use different 
keys to compose diversities? For example, we may use a 
keyed-hash message authentication code (HMAC) algorithm 
and hardcode a unique symmetric key into each client library. 
Note that such an approach is also effective, but it is more 
vulnerable than ours, because hiding a key (i.e., whitebox 
cryptography) is more difficult than hiding the program logic 
0 . 
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IV. Evaluation 

The goal of our work is to impede tampering replication 
by creating diverse software instances, and thus to increase the 
tampering complexity to the software system. In this section, 
we evaluate the complexity increased by N-version obfuscation 
for tampering multiple software clients. 

Suppose a program has adopted the protection mechanism 
discussed in Section IIH-DI A decent attacker wishes to 
manipulate the program binaries for a specific purpose, 
including adding, deleting and modifying. According to the 
adversary analysis, the guard cannot be simply removed or 
disabled from the app, because the MAC mechanism rested in 
the guard needs to be executed. However, In a hostile host 
environment, the software can be fully inspected. Through 
careful analysis, the attacker can identify that the protection 
lies in the integrity checking function, i.e., Algorithm [2 She 
can disable the checking by carefully removing the function or 
suppressing the reaction. If there is no N-version obfuscation 
protection, the attacker can deliver the tampered software 
to other users. The replication of tampered software can be 
executed normally on other machines, and the whole software 
system is broken. With N-version obfuscation protection, if 
the attacker simply repacks the app with tampered guard, 
the message verification would fail, and the program cannot 
function normally. To break the whole software system, the 
attacker need to analyze and tamper the guard of each machine 
specifically. 

In order to tamper the software on another machine, the 
attacker has to obtain the guard on that machine, develop a 
tampered guard, and then substitute the original one. If the 
guard is protected with interleaved self-checksumming code 
0 , a successful tampering requires removing all the self¬ 
checksumming code at the same time, of which the chance is 
very low without sophisticated analysis. Existing approaches 
on identifying such code generally require using dynamic 
analysis and taint analysis together 1231 . Empirically, the time 
required to tamper each guard is not negligible. 

Let co denote the time needed for analyzing one software 
copy and tampering it on the attacker’s own hostile host. The 
time complexity is 0 ( 1 ), which is equal to tampering one 
software copy without N-version obfuscation. Let C| denote 
the time needed to obtain the guard on another machine, so 
as to replace it, and c 2 denote the time needed to tamper it. 
If the attacker wishes to tamper the software on n machines, 
the total time can be estimated as co + n * (ci + C 2 ). Because 
of the interleaved self-checksumming code, C 2 should not be 
negligible, hence the complexity equals to 0{n). 

Another possible tampering approach is to build an 
algorithm similar to the parent algorithm, which can calculate 
the hash value according to the genes of a specific child. To 
this end, the attacker has to recognize the gene setting of 
each guard. She may compare the difference between two 
implementations, and locate the genes. If the attacker has 
derived enough knowledge about our N-version obfuscation 
theory and implementation, such kind of attack is theoretically 
possible. However, if the guard is obfuscated (e.g., with 11201 ). 
it would be very difficult. To our best knowledge, existing work 
on breaking such obfuscated programs either require symbolic 
execution with sophisticated constraint solvers or complicated 


taint analysis m, which is computational intensive and time- 
consuming. Let C 3 denote the time needed to extract the genes 
of a guard. The time needed to tamper the software system can 
be estimated to co + n* (ci + C 3 ). Because C 3 is not negligible, 
the complexity still equals to O(n). Note that n can be made 
arbitrally large as the obfuscation task can be fully automated. 

V. Related Work 

In this section, we first review the important work in reverse 
engineering field, and then discuss the work of applying 
automated program diversity to improve program security. 

A. Reverse Engineering 

Software protection is a research problem since decades 
ago. The proposed solutions are generally two-fold: the hard¬ 
ware circuit assisted solutions which provide better security 
assurance, or the pure software solutions which have better 
adaptability to general hardware 0 . For our research problem, 
hardware circuit assisted solutions are not applicable because 
of their requirement on specific hardware, so we mainly discuss 
the pure software solutions. 

Literatures on software protection with anti-reverse en¬ 
gineering approaches have different purposes. While some 
researchers look for protections against piracy ei Ea 
and intrusion 0 , others investigate on impeding malware 
against detection HU EB |26l. However, they share a set of 
common protection techniques with only slightly difference. 
Obfuscation is a basic software protection approach. It can 
complicate the binaries, and increases the difficulty of the 
reverse engineering. Ogiso et al. propose to obfuscate the 
code by constructing a NP-Hard complexity problem, that 
requires to determine the real function pointer from an array 
of pointers (20l . However, such an approach is vulnerable to 
symbolic execution with constraint solvers. To thwart symbolic 
executions, Sharif et al. notice that some code blocks can 
be concealed by setting a trigger condition with an one-way 
function, so that the constraint solver cannot solve |26|. Wang 
et al. propose another obfuscation technique to combat the 
symbolic execution by exploring the general limitation of 
symbolic execution tools in analyzing loops. Their idea is 
to use unsolved conjectures f28l to confuse the termination 
condition of loops. The approach is vulnerable when the 
tricks of unsolved conjectures are recognized. Other than 
setting tricks on the source code, Linn et al. propose to 
obfuscate the binaries directly by inserting some error bits, 
which can be automatically corrected during execution by 
the CPU but not by current disassembly tools fT71 . The 
security of such a protection is very limited and vulnerable 
to dynamic analysis, i.e., the actual instruction trace can 
be easily obtained once the software is being executed. To 
deter dynamic analysis with debuggers, Oishi et al. propose 
to use some camouflaged anti-debuggers, which, however, 
is not effective for homemade debuggers. On protecting 
software from tampering, another general popular approach 
is to detect the unauthorized modifications during runtime 
by employing a self-checksumming mechanism J5] U4| . The 
self-checksumming mechanism uses redundantly overlapped 
checksum testers inside the program to verify its integrity. On 
the other side, several investigations focus on defeating the 
protections 11 91 [23 , 29l l30l| . Wurster et al. propose to defeat the 
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self-checksumming approach with a duplicated memory attack, 
and examine its effectiveness on several popular cpu types 
|29|. Qiu et al. propose to identify the self-checksumming 
code using taint analysis approaches l23l . Yadegari et al. find 
a more general way of deobfuscating an obfuscated algorithm 
ESI. Our work is different from all the existing work in that 
we focus on impeding the replication of software tampering. 

B. Automated Software Diversification 

The idea of software diversity is initially proposed for 
software reliability engineering m. Cohen in El firstly 
proposes to create functionally equivalent programs to enhance 
software security. Forrest et al. in G2 also states that the 
beneficial effects of diversity in computing systems have 
been overlooked, and introducing diversities into computer 
systems can make them more robust to replicated attacks. 
They propose several possible ways to create such diversities 
with respect to the program behavior, including adding 
nonfunctional code, refectoring code, or diversifying the 
memory layout. Crane et al. in na build upon fine¬ 
grained code diversification to prevent code-reuse attacks. 
They adopt function permutation m. register allocation 
randomization, and callee-saved register save slot reordering 
ll22l in the diversification process. However, to our best 
knowledge, all these work do not consider to automatically 
generate functionally nonequivalent programs to improve 
tamper-resistance. 

VI. Conclusion 

This work has investigated software tamper-resistant issues 
with N-version obfuscation. We have formally defined the In¬ 
version obfuscation problem and provided a viable solution 
with SHA1 algorithm. We have further discussed the appli¬ 
cation of such an N-version obfuscation idea in the client- 
server software architecture. By introducing such functionally 
nonequivalent diversities, our approach is effective to impede 
the replication of tampering. The evaluation result shows that 
the complexity to tamper the software system is linearly 
increased with the number of software versions, which can 
be automatically generated with trivial cost. 
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